-
Notifications
You must be signed in to change notification settings - Fork 78
Sponge state remapping #755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| fn key_to_leaf_index(key: &Word) -> LeafIndex<SMT_DEPTH> { | ||
| let most_significant_felt = key[3]; | ||
| LeafIndex::new_max_depth(most_significant_felt.as_canonical_u64()) | ||
| let least_significant_felt = key[0]; | ||
| LeafIndex::new_max_depth(least_significant_felt.as_canonical_u64()) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure we should change this. Conceptually, if we interpret the word as a 256-bit integer, the most significant bits are the ones that define the path of the first 64 levels of the tree. So, it may still make sense to keep the most significant element define the leaf index.
What's the main motivation for changing this? Is it to avoid doing dup.3 when trying to get the index for SMTs in MASM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, so in our trees we are encoding the paths from MSB to LSB and hence, as you suggested, key[3] is the right choice and hence this should be reverted.
Though, I would claim that MSB to LSB is not intuitive and the LSB to MSB makes more sense, any reasons for picking one over the other ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is mostly an extension of thinking of leaf indexes as integers. For example, if we have a tree of depth 64, the first leaf would be at index 0, the second one at index 1, and the last at index u64::MAX. This naturally means that the most significant bit specifies if the leaf is in the left or right subtree immediately under the root, and the next most significant bit specifies the next subtree etc.
If we extended this to a tree of depth 256, each leaf position would be encoded as a 256-bit integer. And then, again, the most significant would define left/right subtree under the root etc.
Since we assume that the layout of 256-bit integers would be in little-endian form, then most significant bits would be located in key[3]. We could, of course, keep them in key[0] but then:
There is a slight inconsistency in where the relevant bits are. For example, if we write the key as bytes (in little endian form), i.e., 32 bytes - the byte with most significant bits would be at key_bytes[7] which is somewhat counterintuitive.
Contrast this with key[3] - here, most significant bits would be in key_bytes[31]. Which I think is a bit more consistent.
Also, keeping things as is (i.e., using key[3]) should probably result in fewer changes, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, there are only two consistent choices here, either MSB or LSB throughout the 256-bit integer (including within the individual limbs).
To me LSB to MSB is the most intuitive as then we get
- the first limb
K[0]is the most accessible on the op stack - bit 0 decides the root's children, while bit 255 decides the leaves of the tree.
Of course, these will imply a number of changes but nothing major as most of these have already been done.
In any case, the changes are reverted for now and we can come back to this question in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My 2¢: I agree that it's generally more intuitive to order the leaves in "natural" order (i.e. MSB to LSB), though I've often encountered situations where algorithms can be a bit faster if we use "bit-reversed" order. However, the latter is less intuitive and led me down the wrong path a few times. I don't have a strong opinion here, but we could discuss this in a separate issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, though for the VM I would always err on the side of consistency and readability provided the cost for doing so is manageable.
In my mind, this is the last remaining point that is user-facing in order to bring a coherent orientation to the VM, I believe.
5571ba1 to
8dfedbe
Compare
56ef32e to
6eed157
Compare
7907951 to
5cc55be
Compare
| /// The first and second 4-element words of the rate portion. | ||
| pub(crate) const INPUT1_RANGE: Range<usize> = 0..4; | ||
| pub(crate) const INPUT2_RANGE: Range<usize> = 4..8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it may be worth exposing these ranges publicly. I'll make a small commit to do this.
bobbinth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Thank you! I left a couple of comment-related comments inline.
adr1anh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
| fn key_to_leaf_index(key: &Word) -> LeafIndex<SMT_DEPTH> { | ||
| let most_significant_felt = key[3]; | ||
| LeafIndex::new_max_depth(most_significant_felt.as_canonical_u64()) | ||
| let least_significant_felt = key[0]; | ||
| LeafIndex::new_max_depth(least_significant_felt.as_canonical_u64()) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My 2¢: I agree that it's generally more intuitive to order the leaves in "natural" order (i.e. MSB to LSB), though I've often encountered situations where algorithms can be a bit faster if we use "bit-reversed" order. However, the latter is less intuitive and led me down the wrong path a few times. I don't have a strong opinion here, but we could discuss this in a separate issue.
plafer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| /// The first 4-element word of the rate portion. | ||
| pub const INPUT1_RANGE: Range<usize> = INPUT1_RANGE; | ||
|
|
||
| /// The second 4-element word of the rate portion. | ||
| pub const INPUT2_RANGE: Range<usize> = INPUT2_RANGE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use "rate" everywhere else, why do we use "input" here? Wouldn't it be clearer/more consistent to use RATE0_RANGE and RATE1_RANGE?
e.g. we have STATE_RATE_0_RANGE and STATE_RATE_1_RANGE in the VM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your point is valid, but the use of input is sometimes useful when for example we have a left child and a right child of a Merkle tree node. We can then say that the left child is input1 and the right child is input2.
Went ahead and changed toRATE0 and RATE1 to stay consistent both here and in the VM.
4c62719 to
1d739ec
Compare
Describe your changes
Addresses #673
Checklist before requesting a review
nextaccording to naming convention.